library(here)
library(stringr)
library(magrittr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(atlantisom)

Intro

Current atlantisom functions are designed primarily for age structured modeled groups, because the package was originally scoped to generate simulated data to test fishery stock assessments. It is now clear that simulated data testing can also benefit other model types, including multispecies models and food web models. Therefore, there is a need to develop functions to develop simulated data for lower trophic level biomass pools modeled in Atlantis and also included in food web models such as Rpath.

Where is biomass information for non-age structured groups output? To be parallel to survey sampling for fish, it would be good to have it by model output timestep toutinc and polygon, but depth layer is not necessary at this time.

First check content of the .nc output file:

dir <- here("atlantisoutput/NOBA_sacc_30")
file_nc <- "nordic_runresults_01.nc"

file.nc <- file.path(dir, file_nc)

 # Load ATLANTIS output!
  at_out <- RNetCDF::open.nc(con = file.nc)
  #on.exit(RNetCDF::close.nc(at_out))

    # Get info from netcdf file! (Filestructure and all variable names)
  var_names_ncdf <- sapply(seq_len(RNetCDF::file.inq.nc(at_out)$nvars - 1),
    function(x) RNetCDF::var.inq.nc(at_out, x)$name)
  n_timesteps <- RNetCDF::dim.inq.nc(at_out, 0)$length
  n_boxes     <- RNetCDF::dim.inq.nc(at_out, 1)$length
  n_layers    <- RNetCDF::dim.inq.nc(at_out, 2)$length

Which species are biomass pools, are they identified in the groups.csv?

No, biomass pools are defined as the combination of NumCohorts == 1 and GroupType not FISH, FISH_INVERT, SHARK, BIRD, MAMMAL, REPTILE. Biomass pools groups are among those below according to the Atlantis manual, ch 6. p 106:

BIOMASS POOL GROUPS
Phytoplankton
7) Small Phytoplankton SM_PHY
8) Large Phytoplankton LG_PHY (used in combination with IsSiliconDep to define diatoms)
9) Trichodesmium and Cyanobacteria TRICHO (N fixers)
Other primary producers
10) Microphtybenthos MICROPHTYBENTHOS
11) Dinoflagellates DINOFLAG
12) Phytoben PHYTOBEN (typically used to represent macroalgae)
13) Seagrass SEAGRASS
14) Turf TURF
Zooplankton
15) Small Zooplankton SM_ZOO
16) Medium Zooplankton MED_ZOO
17) Large Zooplankton LG_ZOO
18) Jellyfish JELLIES (in the past represented as LG_ZOO)
Large pelagic invetebrates
19)Cephalopod CEP
20) Prawns PWN
Infauna
21) Small Infauna SM_INF
22) Large Infauna LG_INF
Epibenthic organisms
23) Sediment epibenthic filter feeders SED_EP_FF (often used for bivalves, sponges)
24) Benthic grazers SED_EP_OTHER
25) Mobile epibenthos MOB_EP_OTHER (often used for crabs, lobster, octopus)
26) Corals CORAL
27)Sponges SPONGE
Bacteria
28) Pelagic Bacteria PL_BACT
29) Sediment Bacteria SED_BACT
Obligatory detritus groups
30) Carrion CARRION
31) Labile detritus LAB_DET
32) Refractory detritus REF_DET
33) Additional ice and land based groups are also available.
Ice dwelling
34) ICE_BACT
35) ICE_MIXOTROPHS
36) ICE_DIATOMS
37) ICE_ZOOBIOTA
Land dwelling (vegetation only for now)
38) MARSH
39) MANGROVE

For NOBA Atlantis, these are the biomass pools:

fgs <- atlantisom::load_fgs(here("atlantisoutput/NOBA_sacc_30"), "nordic_groups_v04.csv") 

pools <- fgs %>%
  dplyr::filter(NumCohorts==1) %>%
  dplyr::select(Code, Name, Long.Name, NumCohorts, InvertType)

knitr::kable(pools)
Code Name Long.Name NumCohorts InvertType
KCR King_crab Red king crab 1 LG_INF
ZG Gel_zooplankton Gelatinous zooplankton 1 LG_ZOO
ZL Large_zooplankton Large zooplankton 1 LG_ZOO
ZM Medium_zooplankton Medium zooplankton 1 MED_ZOO
ZS Small_zooplankton Small zooplankton 1 SM_ZOO
DF Dinoflag Dinoflagellates 1 DINOFLAG
PS Small_phytop Small phytoplankton 1 SM_PHY
PL Large_phytop Large phytoplankton 1 LG_PHY
BC Predatory_ben Predatory benthos 1 LG_INF
BD Detrivore_ben Detrivore benthos 1 LG_INF
BFF Benthic_filterf Benthic filter feeders 1 SED_EP_FF
SPO Sponges Sponges 1 SED_EP_FF
COR Corals Corals 1 SED_EP_FF
PB Pel_bact Pelagic bacteria 1 PL_BACT
BB Ben_bact Sediment bacteria 1 SED_BACT
DR Ref_det Refractory detritus 1 REF_DET
DL Lab_det Labile detritus 1 LAB_DET
DC Carrion Carrion 1 CARRION

Here are the relevant outputs in the output.nc file for these groups:

# return everything in var_names_ncdf matching Name in pools

var_names_ncdf[str_detect(var_names_ncdf, str_c(pools$Name, collapse ="|"))]
##  [1] "Large_phytop_N"        "Small_phytop_N"        "Dinoflag_N"           
##  [4] "King_crab_N"           "Gel_zooplankton_N"     "Large_zooplankton_N"  
##  [7] "Medium_zooplankton_N"  "Small_zooplankton_N"   "Pel_bact_N"           
## [10] "Ben_bact_N"            "Detrivore_ben_N"       "Predatory_ben_N"      
## [13] "Ref_det_N"             "Lab_det_N"             "Carrion_N"            
## [16] "Benthic_filterf_N"     "Sponges_N"             "Corals_N"             
## [19] "Benthic_filterf_Cover" "Sponges_Cover"         "Corals_Cover"

So we want “N” nitrogen output. In theory this can be achieved with atlantisom::load_ncwith select_variable = "N". Looks like it works, output is N for our selected biomass pool species over all polygons, layers, and times. Output of unique(testNpools$species):

select_groups <- pools$Name
nc_out <- file_nc
bps <- load_bps(dir = dir, fgs = "nordic_groups_v04.csv", file_init = "nordic_biol_v23.nc")
  # Get the boundary boxes
  allboxes <- load_box(dir = dir, file_bgm = "Nordic02.bgm")
  boxes <- get_boundary(allboxes)



testNpools <- load_nc(dir = dir,
                  file_nc = nc_out,
                  bps = bps,
                  fgs = fgs,
                  select_groups = select_groups,
                  select_variable = "N",
                  check_acronyms = TRUE,
                  bboxes = boxes)
## [1] "Read /Users/sarah.gaichas/Documents/0_Data/Atlantis/Poseidon/poseidon-dev/atlantisoutput/NOBA_sacc_30/nordic_runresults_01.nc successfully"
unique(testNpools$species)
##  [1] "Benthic_filterf"    "Sponges"            "Corals"            
##  [4] "King_crab"          "Gel_zooplankton"    "Large_zooplankton" 
##  [7] "Medium_zooplankton" "Small_zooplankton"  "Dinoflag"          
## [10] "Small_phytop"       "Large_phytop"       "Predatory_ben"     
## [13] "Detrivore_ben"      "Pel_bact"           "Ben_bact"          
## [16] "Ref_det"            "Lab_det"            "Carrion"

I don’t think we have a function that aggregates over layers and converts N to biomass but there are pieces in other functions.

New atlantisom function: calc_biomass_pool based on calc_biomass_age, but adding the expansion to volume used in calc_pred_cons.

With the following exceptions!

Benthic groups have N expanded to area

LG_PHY type has N expanded to only water column, not sediment layers.

# we need the volume of each layer as input to this
vol <- load_nc_physics(dir = dir,
                         file_nc = nc_out,
                         physic_variables = "volume",
                         aggregate_layers = FALSE,
                         bboxes = boxes)

# we need area for the benthic inverts instead of volume
# THANK YOU JOE for this tip
area <- load_boxarea(dir = dir,
                     file_bgm = "Nordic02.bgm")

# we need a list of which group types to expand to area instead of volume
benthos <- c("SED_EP_FF", "SED_EP_OTHER", "SED_BAC", 
             "MOB_EP_OTHER", "LG_INF", "SM_INF", 
             "MICROPHYTBENTHOS", "SEAGRASS")



calc_biomass_pool <- function(pooln, vol, area, fgs, biolprm){
  datalist <- list(pooln)

  # Conversion factor from mg N to t wet-weight
  # should only use conversion for non vertebrates
  # also need volume of cell info
  bio_conv <- biolprm$redfieldcn * biolprm$kgw2d / 1000000000
  
  # get fgs info
    # use grouptype column to allocate
  colnames(fgs) <- tolower(colnames(fgs))

  # check for GroupType or InvertType
  if (!("grouptype" %in% colnames(fgs) | "inverttype"%in% colnames(fgs))) {
    stop(paste("The columns GroupType or InvertType ars not in your functional\n",
               "groups file."))
  }

  # change inverttype to grouptype, contents should be the same
  if("inverttype" %in% names(fgs)) names(fgs)[names(fgs) == 'inverttype'] <- 'grouptype'

  fgs$grouptype <- tolower(fgs$grouptype)

  
   pooltype <- fgs |>
    dplyr::select(species=name, grouptype) |>
    dplyr::mutate(pooltype = dplyr::case_when((grouptype %in% c("lg_inf",
                                                                "sm_inf",
                                                                "sed_bact",
                                                                "sed_ep_ff",
                                                                "sed_ep_other",
                                                                "mob_ep_other",
                                                                "coral",
                                                                "sponge")) ~ "benthos",
                                              (grouptype %in% c("lg_phy")) ~ "lg_phy",
                                              TRUE ~ "alllayers"))

  data_names <- c("species", "agecl", "polygon", "layer", "time", "atoutput")

  if (all(sapply(datalist, function(x) all(is.element(names(x), data_names))))){

    names(vol)[names(vol) == "atoutput"] <- "vol"

    sedlayer <- max(vol$layer)

    pooln <- merge(pooln, vol,
                   by = c("time", "polygon", "layer"))

    pooln <- merge(pooln, area,
                   by = c("polygon"))

    pooln <- merge(pooln, pooltype,
                   by = c("species"))

    #pooln$atoutput <- with(pooln, vol * atoutput * bio_conv)

    pooln <- pooln |>
      dplyr::mutate(atoutput = dplyr::case_when(pooltype == "alllayers" ~ vol * atoutput * bio_conv,
                                                pooltype == "lg_phy" ~
                                                  ifelse(layer!=sedlayer, vol * atoutput * bio_conv, 0),
                                                pooltype == "benthos" ~
                                                  ifelse(layer==sedlayer, area * atoutput * bio_conv, 0)))

    # Sum over layers 
    biomass_pools <- aggregate(atoutput ~ species + agecl + time + polygon,
      data = pooln, sum)
  } else {
    stop(paste("Dataframe names do not match with", data_names))
  }

  return(biomass_pools)
}

biolprm <- load_biolprm(dir = dir, file_biolprm = "nordic_biol_incl_harv_v_011_1skg.prm")
## Warning in `[<-.data.frame`(`*tmp*`, , 3:(maxmat + 2), value = structure(list(:
## provided 20 variables to replace 10 variables
biomass_pools <- calc_biomass_pool(pooln = testNpools,
                                   vol = vol,
                                   area = area,
                                   fgs = fgs,
                                   biolprm = biolprm)

Do the new biomass pools match the BiomIndx.txt output when aggregated over polygon? If so they can be the input to the survey function for time and spatial subsetting.

aggbiopools <- biomass_pools %>%
  dplyr::group_by(species, agecl, time) %>%
  dplyr::summarise(totpool = sum(atoutput))
## `summarise()` has grouped output by 'species', 'agecl'. You can override using
## the `.groups` argument.
txtbiopools <- atlantisom::load_bioind(dir, "nordic_runresults_01BiomIndx.txt", fgs) %>%
  dplyr::mutate(time = time/73) %>%
  dplyr::filter(species %in% unique(aggbiopools$species)) %>%
  dplyr::select(species, time, atoutput)

ggplot(txtbiopools, aes(x=time, y=atoutput)) +
  geom_line() +
  geom_point(data = aggbiopools, aes(x = time, y = totpool), colour = "blue", alpha=0.1) +
  theme_bw() +
  facet_wrap(~species, scales = "free_y")

Diagnose mismatches between biopools and BiomInd.txt output

Who matches exactly (enough). Patterns in absolute mismatch are interesting, some species are really close and some always way off. Volume wrong, depth layers wrong? Mine are always higher than the txt output when they are off, so maybe too much expansion somewhere.

Joe pointed out that benthic groups should be expanded to area rather than volume. That correction has now been applied, and matches are better, but still diverge over time for some groups. Not enough to worry about I think.

Originally, large phytoplankton was way off, but Joe also pointed out that LG_PHY N can be in the sediment layer but shouldn’t “count” as biomass. Therefore, this group is expanded to all layers but the sediment layer and that creates a much better match.

mismatch <- txtbiopools |>
  dplyr::left_join(aggbiopools) |>
  dplyr::mutate(mismatch = atoutput - totpool,
                bad = ifelse(abs(mismatch) > 0.01*(atoutput), 1, 0))
## Joining, by = c("species", "time")
ggplot(mismatch, aes(x=time, y=mismatch)) +
  geom_line() +
  theme_bw() +
  facet_wrap(~species, scales = 'free_y')

This is mismatch coded as 0 if less than 1% of txt output, 1 if more.

ggplot(mismatch, aes(x=time, y=bad)) +
  geom_line() +
  theme_bw() +
  facet_wrap(~species)

Large phytoplankton formerly had the largest magnitude mismatch across all groups, now fixed by not expanding N in the sediment layer for this group type.

ggplot(txtbiopools |> filter(species %in% c("Large_phytop", "Small_phytop")), 
       aes(x=time, y=atoutput)) +
  geom_line() +
  geom_point(data = aggbiopools |> filter(species %in% c("Large_phytop", "Small_phytop")), 
             aes(x = time, y = totpool), colour = "blue") +
  theme_bw() +
  facet_wrap(~species, scales = "free_y")

These groups seem close enough to me now.

ggplot(txtbiopools |> filter(species %in% c("Predatory_ben", "Detrivore_ben")), 
       aes(x=time, y=atoutput)) +
  geom_line() +
  geom_point(data = aggbiopools |> filter(species %in% c("Predatory_ben", "Detrivore_ben")), 
             aes(x = time, y = totpool), colour = "blue", alpha=0.1) +
  theme_bw() +
  facet_wrap(~species, scales = "free_y")

Try NEUS?

dir <- here("atlantisoutput/NEUSv2.1.0")
file_nc <- "neus_output.nc"

file.nc <- file.path(dir, file_nc)

 # Load ATLANTIS output!
  at_out <- RNetCDF::open.nc(con = file.nc)
  #on.exit(RNetCDF::close.nc(at_out))

    # Get info from netcdf file! (Filestructure and all variable names)
  var_names_ncdf <- sapply(seq_len(RNetCDF::file.inq.nc(at_out)$nvars - 1),
    function(x) RNetCDF::var.inq.nc(at_out, x)$name)
  n_timesteps <- RNetCDF::dim.inq.nc(at_out, 0)$length
  n_boxes     <- RNetCDF::dim.inq.nc(at_out, 1)$length
  n_layers    <- RNetCDF::dim.inq.nc(at_out, 2)$length

For NEUS Atlantis, these are the biomass pools:

fgs <- atlantisom::load_fgs(here("atlantisoutput/NEUSv2.1.0"), "neus_groups.csv") 

pools <- fgs %>%
  dplyr::filter(NumCohorts==1) %>%
  dplyr::select(Code, Name, LongName, NumCohorts, GroupType)

knitr::kable(pools)
Code Name LongName NumCohorts GroupType
SCA Scallop Sea scallop 1 SED_EP_FF
QHG Quahog Ocean quahog 1 SED_EP_FF
CLA Surf_Clam Atlantic surf clam 1 SED_EP_FF
BFF Filter_Other Other benthic filter feeder 1 SED_EP_FF
BG Benthic_grazer Benthic grazer 1 SED_EP_OTHER
LOB Lobster Lobster 1 MOB_EP_OTHER
RCB Red_Crab Red deep-sea crab 1 MOB_EP_OTHER
BMS Macrobenth_Shallow Shallow macrozoobenthos 1 MOB_EP_OTHER
ZL Carniv_Zoo Carnivorous zooplankton 1 LG_ZOO
BD Deposit_Feeder Deposit Feeder 1 LG_INF
MA Macroalgae Macroalgae 1 PHYTOBEN
MB MicroPB Microphtybenthos 1 MICROPHTYBENTHOS
SG Seagrass Seagrass 1 SEAGRASS
BC Benthic_Carniv Benthic Carnivore 1 MOB_EP_OTHER
ZG Gelat_Zoo Gelatinous zooplankton 1 LG_ZOO
PL Diatom Diatom 1 LG_PHY
DF DinoFlag Dinoflagellates 1 DINOFLAG
PS PicoPhytopl Pico-phytoplankton 1 SM_PHY
ZM Zoo Mesozooplankton 1 MED_ZOO
ZS MicroZoo Microzooplankton 1 SM_ZOO
PB Pelag_Bact Pelagic Bacteria 1 PL_BACT
BB Sed_Bact Sediment Bacteria 1 SED_BACT
BO Meiobenth Meiobenthos 1 SM_INF
DL Lab_Det Labile detritus 1 LAB_DET
DR Ref_Det Refractory detritus 1 REF_DET
DC Carrion Carrion 1 CARRION

Here are the relevant outputs in the output.nc file for these groups:

# return everything in var_names_ncdf matching Name in pools

var_names_ncdf[str_detect(var_names_ncdf, str_c(pools$Name, collapse ="|"))]
##  [1] "Carniv_Zoo_N"         "Carrion_N"            "Deposit_Feeder_N"    
##  [4] "Diatom_N"             "Diatom_S"             "DinoFlag_N"          
##  [7] "Gelat_Zoo_N"          "Lab_Det_N"            "Meiobenth_N"         
## [10] "MicroPB_N"            "MicroPB_S"            "MicroZoo_N"          
## [13] "Pelag_Bact_N"         "PicoPhytopl_N"        "Ref_Det_N"           
## [16] "Sed_Bact_N"           "Zoo_N"                "Benthic_Carniv_N"    
## [19] "Benthic_grazer_N"     "Filter_Other_Cover"   "Filter_Other_N"      
## [22] "Lobster_N"            "Macroalgae_Cover"     "Macroalgae_N"        
## [25] "Macrobenth_Shallow_N" "MicroPB_Cover"        "Quahog_Cover"        
## [28] "Quahog_N"             "Scallop_Cover"        "Scallop_N"           
## [31] "Seagrass_Cover"       "Seagrass_N"           "Surf_Clam_Cover"     
## [34] "Surf_Clam_N"

NEUS has some Cover types as well as N outputs.

select_groups <- pools$Name
nc_out <- file_nc
bps <- load_bps(dir = dir, fgs = "neus_groups.csv", file_init = "neus_init.nc")
  # Get the boundary boxes
  allboxes <- load_box(dir = dir, file_bgm = "neus_tmerc_RM2.bgm")
  boxes <- get_boundary(allboxes)



testNpools <- load_nc(dir = dir,
                  file_nc = nc_out,
                  bps = bps,
                  fgs = fgs,
                  select_groups = select_groups,
                  select_variable = "N",
                  check_acronyms = TRUE,
                  bboxes = boxes)
## Warning in load_nc(dir = dir, file_nc = nc_out, bps = bps, fgs = fgs, select_groups = select_groups, : Some selected groups are not active in the model run. Check 'IsTurnedOn' in fgs
##  Red_Crab
## [1] "Read /Users/sarah.gaichas/Documents/0_Data/Atlantis/Poseidon/poseidon-dev/atlantisoutput/NEUSv2.1.0/neus_output.nc successfully"
unique(testNpools$species)
##  [1] "Scallop"            "Quahog"             "Surf_Clam"         
##  [4] "Filter_Other"       "Benthic_grazer"     "Lobster"           
##  [7] "Macrobenth_Shallow" "Macroalgae"         "Seagrass"          
## [10] "Benthic_Carniv"     "Carniv_Zoo"         "Deposit_Feeder"    
## [13] "MicroPB"            "Gelat_Zoo"          "Diatom"            
## [16] "DinoFlag"           "PicoPhytopl"        "Zoo"               
## [19] "MicroZoo"           "Pelag_Bact"         "Sed_Bact"          
## [22] "Meiobenth"          "Lab_Det"            "Ref_Det"           
## [25] "Carrion"
# we need the volume of each layer as input to this
vol <- load_nc_physics(dir = dir,
                         file_nc = nc_out,
                         physic_variables = "volume",
                         aggregate_layers = FALSE,
                         bboxes = boxes)

# we need area for the benthic inverts instead of volume
# THANK YOU JOE for this tip
area <- load_boxarea(dir = dir,
                     file_bgm = "neus_tmerc_RM2.bgm")

biolprm <- load_biolprm(dir = dir, file_biolprm = "at_biology.prm")
## Warning in `[<-.data.frame`(`*tmp*`, , 3:(maxmat + 2), value = structure(list(:
## provided 20 variables to replace 10 variables
## Warning in load_biolprm(dir = dir, file_biolprm = "at_biology.prm"): NAs
## introduced by coercion
biomass_pools <- calc_biomass_pool(pooln = testNpools,
                                   vol = vol,
                                   area = area,
                                   fgs = fgs,
                                   biolprm = biolprm)

Do the new biomass pools match the BiomIndx.txt output when aggregated over polygon? If so they can be the input to the survey function for time and spatial subsetting.

aggbiopools <- biomass_pools %>%
  dplyr::group_by(species, agecl, time) %>%
  dplyr::summarise(totpool = sum(atoutput))
## `summarise()` has grouped output by 'species', 'agecl'. You can override using
## the `.groups` argument.
txtbiopools <- atlantisom::load_bioind(dir, "neus_outputBiomIndx.txt", fgs) %>%
  dplyr::mutate(time = time/73) %>%
  dplyr::filter(species %in% unique(aggbiopools$species)) %>%
  dplyr::select(species, time, atoutput)

ggplot(txtbiopools, aes(x=time, y=atoutput)) +
  geom_line() +
  geom_point(data = aggbiopools, aes(x = time, y = totpool), colour = "blue", alpha=0.1) +
  theme_bw() +
  facet_wrap(~species, scales = "free_y")

mismatch

mismatch <- txtbiopools |>
  dplyr::left_join(aggbiopools) |>
  dplyr::mutate(mismatch = atoutput - totpool,
                bad = ifelse(abs(mismatch) > 0.01*(atoutput), 1, 0))
## Joining, by = c("species", "time")
ggplot(mismatch, aes(x=time, y=mismatch)) +
  geom_line() +
  theme_bw() +
  facet_wrap(~species, scales = 'free_y')

This is mismatch coded as 0 if less than 1% of txt output, 1 if more.

ggplot(mismatch, aes(x=time, y=bad)) +
  geom_line() +
  theme_bw() +
  facet_wrap(~species)

Learned that we should not hardcode the sediment layer! Added a sedlayer variable to the function.

With corrections for LG_PHY things now line up in both models.

ggplot(txtbiopools |> filter(species %in% c("Diatom", "DinoFlag")), 
       aes(x=time, y=atoutput)) +
  geom_line() +
  geom_point(data = aggbiopools |> filter(species %in% c("Diatom", "DinoFlag")), 
             aes(x = time, y = totpool), colour = "blue") +
  theme_bw() +
  facet_wrap(~species, scales = "free_y")

Remaining changes to workflow, once corrected

Add calc_biomass_pool to run_truth

With an if statement?

Test wrapper functions with biomass pool groups

Only need species and index here, comps don’t apply.